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Hta, ev, w, w2£3X& 0 
[0 0 6 1] 

Jf> width^Oi D'h£»t*Uf, ^S^-^-f TCO^-^-XcM^o n u 
mb 1 k t n f b s U alpha K 3 Mttftim^ttftmmt<ZffifflZti 

fzm<7)n j ^zmzLfzi><7) (- a a, n > £Att& 0 tfc, x d: iwi 

dth) Ce v ( i + 1 : n , 1 : iwidth) t* a ( i + 1 : n , i) £ 
A^J U Z\fit, alphata e v £Hfrf &o Ml:, d ott 

, is, i e U w (1 :blk, iwidth) £ a ( i s + 1 I n , 

is:ie)t*ev (i s + 1 :n, 1 : iwidth) T±#i U Is] C < 
w (1 : b 1 k, 1 : iwidth) £T R L (a (ie+i: is, is: i 
e) ) t* e v ( i e + 1 : i s, 1 : iwidth) -CH$rf&o C1CL"C, T 
RLIi, T=ftff^»«"e**o 
[0 0 6 2] 

w2^If^^ MUD I AG (w 2 ) a ( i s I i e, is: 
i e ) 69*t&Hf^* V frZfeffi-r&o 
>X<D d o t?w 2 (il, i 2) £ w 2 (il, i 2) * (a ( i s + i 2 : 
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n, i s + i 2 - 1 ) t* a ( i s + i 2 I n, is+il-l)) 
o HO^d w2 ( i 1, i 2) £w2 (i h i2)+w2 (i K 

il + 1! i2-l) *w2 (il + i: i 2 - 1, i 2) X^Wxirho 
[0 0 6 3] 

H^, ^d o^C"C\ w2 (i U i 2) £w2 (i K i 2) * w 2 (i 2 
, i 2) "Clf U w (1 : b 1 k, 1 : iwidth) , ev (is + n:n 
, 1 : iwidth), ev (i e + 1 ! is, 1 I iwidth) £Hfr LT 

[0 0 6 4] 

0 1 9 ~EI2 9 (2, JSH&n- K«^7n-ft- h "C^c L/i[I]-C&& 0 
Ell 9fi, mttfotfn* 3miftitir&^7*^-?->trid<D-7U-x$>2>o * 
f7^Sl OC^tli, -9--7 - ;V-^-> t LTsharedK^ijA (k, n) , diag ( 
n) iSJ^sdiag (n) *\?Jo diag, sdiag(£ftjf Lfc 3 nttftft M^ft^m 
, BJ^S3R«:W*fc t-C^*Pi-^o ffc*i*U (n+1, iblk) , v (n+1, 
iblk) ^^-^-^F^^-rflS^LsharedStt^Uffl-t^o Xf^Sl 1 K*3i/»T 
, ^ W YSlWL U #X U y K-en - * ;WtnumthrH||:^ 1^ y Rlfcnothrd^^ 
Flzm^MbtitzXUv Fttt&feL, &*Vv VX'&TZmfc-fZo ib 
v *(B£tS:5eU nb=(n-2+iblk-l)/iblk, nbase=0, i=l^|S^1"^>o X 
r--y-7°S 1 2 K&V^Tfi, i>nb-l^^^W»f-r^o Xr^S 1 2 O^JWf^Y 
ESO^i:^ '^S 1 9lCjttfo ^fy^S 1 2 K£tt&¥lJ$p$ ? N 0<7) 
^f^-^{±, X-f y 7°S 1 3 (Cj3V>T, nbase=(i-l) X iblk, istart=l, nwidth=ib 
lk*mm-t-&o Xf7/S 1 4Ui3V>t\ ^:/;V-^>copy£0^"C_t=£i8B55- 
t:THft^^3t-n o ^f7 7°S 1 5K£v*T, MflSUCyn -y ? = 
miftit<7)tt^M*^t:-'$-&o 1" U (nbase+l:n, 1: iblk) —A (n 

base+l:n, nbase+1 :nbase+iblk) fcHfrl"*© ^-r^^S 1 6 H&^T, *t-ffr 
-■^blktridSrn^A/'CUKn tf- LfcSU^OHfiWAfk^ff "9 o (istart=l, 
7n -y ^ipS(±iblk*$tt^i-) o 7;f 7 7°S 1 7Ci3V^ *l£ & 

C0^r@S5 : ijA^Mi"o 1~&:b*>, A (nbase+1 :n, nbase+1 :nbase+iblk) — U (nbas 
e+l:n, 1 : iblk) £?Ho Xr-y^S18HM»t, ^^-^updatefcn^ 
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"C\ A (nbase+iblkiru nbase+iblk:n) <7)TH^^5^^Mff 7 7°S1 2 

[0 0 6 5] 

7-f y 7°S 1 9 KH^Xit^ nbase=(nb-l) Xiblk, istart=K iblk2=n-nbase 

mPMZ^ti-irZo -r&fc>*>, U (nbase+l:n,l:nwidth) —A (nbase+l:ru n 
base+l:n) Srfr^o Xf7/S2 1 i:fev»T, — •f-^blktrid&P^TU 

tcntf- Lfcg&^HmJt^fc^T d (istart=l, 7*n ^ ^iHfiiblk2£$ltig 
1") o *-r y 7°S 2 2 KfcuT. =m#fcffcSftfc*><&«:K?!jAKg|-*\, 

A (nbase+l:ru nbase+l:n) *-U (nbase+l:ru llnwidth) =Mt-)o Xf 7 
•7°S 2 3 K*3V*TUU »MS<7)/c^H^L7t^ V v K^iLt, t7>-f 

[0 0 6 6] 

EI2 0(±, •9-yjl/-^->blktrid<7)7n-T^^)o ^(7)t7>-f XifffM 

subroutine blktrid (A, k, n, dig, sdig, nbase, istart, nwidth x IK V, noth^ n 
umthrd) 

-^t% nbase(±7*n y 7 <D&W%7f;-fir 7 -t v K i start liffflf^ttJ Lt 
Jtffc t * & «/h $ tv/c rn y 7 <D yu y 7 \HX<D *7-ky h xmffl t± 1 -effilf tfj 

t--7 7°S 2 5 K:J3V>T(i, nwidth<10^Sri*£#iJBri"£o Xf77 , S 2 5 Offlifr 
^NOO^lCIt Xf -7-7°S 2 7 KMtto 7^ y7°S 2 5comm^YES(Dm 
7y-y 7°S 2 6 izis^X, ■9"T/l'-?->btunit£n¥ A,X 3 mttft1t*ft 

^^mitLWtru y 7m*%CZ-X, lWai-f istart2=istart, nwidth2=nwidth 

Xr y -7°S 2 8 K£^-Tli, ^r^-^Vblktrid^S^f^Hll^^ait-o Xir» 
■7°S 2 9 Hfc^Tii. 7 U ^ KfigT^'j Tl^^^^^o *-r v 7° S 3 0 iz&\,*x 
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(i, mm-e&XUy K7&Wai-^>^* (is2. is3) , (ie2, ie3) SrtHH" 

•5>o ~$tak>%^ istart3=istart+nwidth/2 N nwidth3=nwidth-nwidth/2^ is2=ista 
rt2^ ie2=istart+nwidth2-l> is3=istart3x ie3=istart3+nwidth3-K iptr=nbas 
e+istart3> len=(n-iptr+numthrd-l)/numthrd^ is=iptr+(nothrd-l) X len+U ie 
=min(n, iptr+nothrdX len) &ft^i~&o Xfy^S 3 1 i:^v^t!i, U (islie 
,is3:ie3) =U (is: ie, is3: ie3) -U (is: ie, is2: ie2) XW (is3: ie3, is2: ie2 
) t_w (is:ie, is2:ie2) XU (is3: ie3, is2: ie2) t^ff^t, 7^S3 2 

>blktrid£#vflft W^tii LT, ^ f)^- * > £&0t& o 
[0 0 6 7] 

EI 2 lRXrm2 2(±, •9-7';i/-^>blktridOp ( 3^;i/-^>T^^^r;V-^- 
>btuni t (7) 7 n - -eab ^ o 

X-r *y T'S 3 5 Hj3V*T, tmp (numthrd) , sigma, alpha£ shared^ tt"C#J •) 
ffrlt&o ^f7 7°S3 6i:j3V>t, nbase+istart>n-2^S^£^Bi/ri~&o ^7="^ 
3 6 (Dmm^YE S<DWj&K&. *f?)\'-^y*mifh 0 Xf y 3 6 <D 
W»f^NOc7)^^tC(iXx ^-/S 3 8 tCjttfo Xt'y7°S3 8T1i, i = istart<h 
U ^T7 7°S3 9(3v>-C, i<=istart-l+nwidthf ^*^^«:^JBfi-*o * 
f'7 7°S 3 9<7)fiJ|ff^NO<7)^}C(±. ^T--y~7°S 3 9 

,£us> |?-^ie : Srlt^i"-5>o iptr2=nbase+u len=(n-iptr2+numthrd-l)/numthrd> 
is=iptr2+(nothrd-l) X len+K ie=min(n, iptr2+nothrdX len) ^ wYM-'T ^> o 
•y7°S4 1-C(i, /*iJTH8!£flfc«K Xf 7 7°S4 2t(J, tmp (nothrd) =U ( 
is: ie, i) txU (is: ie, i) SrftifU Xf 7 7"S4 3t\ /^'J TW\M $:~%LZ> 0 
Xt7 7°S4 4 K;fcv*T\ nothrd=l^^^fiJif1-^o Xf-y7S4 4 OflJBff^ 
N0«C(t 7 7 t S4 6i:Itf 0 ^f7 7 c S4 4WiW i YES<7)f^ 

=Mftfait<F>tztb<D^%*'ft*) (^*7 7s*)Vys<9 Y)V<n¥tW£) « sigma=sqrt 
(sum(tmp(l:numthrd))h iit, S UMfiflK s q r t f£^F77 ; l£'e$)£o diag 
(iptr2) =u(iptr2, i)\ sdiag(iptr2)=-sigma> U (nbase+i+K i) =U (nbase 
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+ i+l, i) +sign(u(nbase+i+l, i) Xsigma^ alpha=l. 0/(sigmaXu(nbase+i+l, i) „ 
U (iptr2, i) -alphaSrftJfcLT, J* -r v 7°S 4 6 Kittr 0 Xf77'S4 6t*ti 
> TIWjWi^M , Xf^S47tli, iptr3-iptr2+l, Xfy^S48T? 
l± N V (is: ie, i) =A (iptr3:ru iptr2+is: iptr2+ie) tu (ptr3:n, i) 

u xf7-/s4 9m rra»!*flJt-&o 

[0 0 6 8] 

Xf7 7°S5 0T1i, V (is:ie, i) =alphaXV (is:ie, i) -V (is: ie, 1: i-1 
) X (U (iptr3:n, l:i-l) txu (iptr3:ru i) ) -U (is: ie, 1: i-1) X (V ( 
iptr3:n,l:i-l)tXU (iptr3:n, i) ) SrftgU Xr7 7°S 5 1 Ci5V>t, 
T\5lMZW2>o y 5 2 K£v>Tfi, tmp (nothrd) =V (is:ie, i) txu 

(is:ie, i) &tf-£U X -r -y -7°S 5 3 ^£^T, A'J TRSSSr^o 7 7° 
S 5 4 KiJ^-ni, nothrd=l3^S^* J RIWfi"*o 7 7'S 5 4 e^lJBff^N 0<7) 
*&fcl±, Xf7^S5 6i:I^ Xfv -/S 5 4 0JpJWf76*YE SO*fir£-KU\ 
Xf ';7°S 5 5 HJitfo ^x -y-7°S 5 5^*5V^-C(i> beta=0. 5XalphaXsum(tmp 
(l:numthrd)) ZtiM1r2> 0 tztzL, sum(i^* h ;K^fP=Sr &t^-C$>& 0 
7 7"S5 6t(±, TPUW SrflXio ^f-/7S5 7W V (is:ie, i) =V( 
is: ie, i) -betaXU (isiie, i) £ftlfU Xf 7 7°S 5 8t:J3V^, ^'JTRl 
£i£IjX&o Xf-y^S 5 9 KlS^Xlt, ptr2<n-2^S^Sr*J»f U fWYES 
K(±, ^t- -y-7°S 6 0 tcisv^T, U (is: ie, i+1) =U (is: ie, i+1) -U ( 
is: ie, istart: i) XV (i+1, istart:l) t-V (is: ie, istart : i) XU (n+1, ista 
rt:l) t^ff^t, 7»tv 7°S 3 9 \ZmZ>o ^f7 7S 5 9 0^^00^ 
HJi, Xt7 7°S 6 lK&^X, U (is: ie, i+1: i+2) =U (is: ie, i+1 : 1+2) -U 

(is: ie, istart : i) XV (i+1 :n, istart : i) t-V (is: ie, istart : i) XU (n+1: 
n, istart :i) t£ft#LT. *f7)V-?- y £fttt&o 
[0 0 6 9] 

H2 3fi. -9-T^-^->update(7)7n--e^)-So 

Xf 7 7°S6 5i:U^t> T|W|J|l§£5t*K *-r 7 7*S 6 6 K^T, #x 
Wy K-C^TSrf^O, Mr -f%fr*>, nbase2=n 

base+iblk^ len=(n-nbase2+2Xnumthrd-l) / (2Xnumthrd) > isl=nbase2+(noth 
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rd-l)len+l, iel=min(n, nbase2+nothrdX len) , nbase3=nbase2+2XnumthrdX le 
ru isr=nbase3-nothrdX len+K ier=min(n, isr+len-l) £ ft#i~ %> 0 Xf v^S 
6 7Tii> A (iel+l:n, isl: iel) =A (iel+1 :n, isl+1 :n, isl : iel)-W (iel+l:n 
, liblk) XU (isl:iel,l:blk) t-U (iel+l:n, l:blk) XW (isl : iel, 1 :blk) t 
, A (ier+l:n, isr: ier) =A (ier+1 :n, isr : ier) -W (ier+l:n, l:blk) XU (i 
sr:ier, l:blk) t-U (ier+l:n, l:blk) XW (isr: ier, l:blk) t^ff-jfu *f 
v7°S6 8X\ ^r;^-^>trupdate*0^^^#55-co^^f^J^^Mff-r^ 
o isl, i e 1 , A, W, U£?£1"o Xv- y 7°S 6 9 !<z&^x, ^-f)V-^y 
trupdate^0^^-e^^<7)^ ^ ^f^JS^^Mifi-^o isr, ier, A, W, 

[0 0 7 0] 

®2 4(i, +rr^-if->trupdate mftftmUftO^m <V7U 0 
[0 0 7 1] 

Xx /-7°S 7 5 Hi3V^rblk2tc^rn y ^MfrfflT'n y ?ipI£f£7EU i=is 
^fx^-f^o Xt'yyS 7 6 Kj3V>T, i>ie-l^S^^*iJiffi-^>o Xf7^S7 
6 tCjott^>*IJ» ? YE Sco^-^C(±, ^y^-^-^^ifeit^o y/S7 6t: 

l^.^^^o -f%t>-tb, is2=i, ie2=min(i+blk2-l, ie-1) , A (is2:ie 
-1, is2, ie2) =A (is2: ie-1, is2, ie2) -U (is2: ie-1, l:blk) XW (is2:ie2, 1: 
blk) t_w (is2:ie-l,l:blk) XU (is2: ie2, 1 :blk) t^ffifu Xf^S7 
8 1:^, i = i+blk2hf^L, J*r- y -7°S 7 6 ^M&o 
[0 0 7 2] 

Xf y y°S 8 0 iCjoWT, 3t-^l/-; K^t^f^o r^iJU^f <h # 
<7)£n*, hrps^it^-f^o 1-^t>-t>, len=(n-nbase+2Xnumthrd-l) / (2Xnumth 
rd) , isl=nbase+ (nothrd-l) X len+1, lenl=max (0, min(n-isl+l, len) ) , nba 
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se3=nbase+2 X numthrd X leru isr=nbase3-nothrdX len+l A lenr=max (0,min(n-i 
sr+l,len) ) ^MfWf^ 0 Xr^S8 1i:^t, ^TOl/-?- >bandcp^Bf 0* 
ttl-To ^TO^EiPJJ^^isK iliSlenl-e#S^II^^3e--r^ 0 Xf7^S8 

[0 0 7 3] 

H2 6(±, +f- - >bandcpO 7 n — Tab & 0 
^^n^-^^f^ Q f^it^WX (nb, nb) tU #i££is, flSfc lenT^t* 

O 

[0 0 7 4] 

y "7°S 8 5 Hi5V»"t\ nn=min(nb, len) > loopx= (len+nn-l) /nru j=l£ 
^#-T^ 0 7f ';7°S8 6(:^i^, j>loopx^S^*¥lJlff-r^o ';7'S8 
6 0¥« ? YE SOBffi, tr;v-f >^|?7t^o ^^-7-7°S 8 6 O^iJIWN 

£nnx<h -%r(Dj- 7 -b > Hp £ 0 ip=is+(j-l) Xnru nl=len-(j-l) Xnru nnx= 

minGin.nlK len2=n-ip-nnx+K loopy=(len2+nn-l)/nru TRL (WX (l:nnx,l:nnx 
) ) =TRL (A (ip: ip+nnx-1, ip: ip+nnx-1) ) > TRU (A ( ip: ip+nnx-1, ip: ip+nn 
x-1) ) =TRL (WX(l:nnx, i:nnx) ) , i=l, is2=ip, is3=ip+nnx^ft#-r^o fztzL 
, mJli±=ftUft, TRL(iTH£jg|^o 
[0 0 7 5] 

Xf77'S8 8 -Cfi, i>loopy-l^S^^*lJlff-r^>o 7f v 7°S 8 8 <DmWrrf 
N0^i#^tC(±. ^f7/S8 9C^t, nnXnnxO^^fst L t ^ t: - f 
-5>o •t^t?^, WX (l:nru lrnnx) = A (is3: is3+nn-K is2: is2+nnx-l) , A 

(is2:is2+nnx-l, is3:is3+nn-l) =WX (l,nn:l,nnx) is3=is3+nn£ft^ L"C, 
^T7 7'S8 8t:I^ 0 X-ry 7°S 8 SOmWx&YE Som&^li, Xy-yy'S 
9 0t\ Wik<r>Uft*i \£—Th 0 -t&t>-t>, nn=n-is3+K WX (l:nn,l:nx) = 
A (is3:n, is2:is2+nnx-l) , A (is2: is2+nnx-l, is3:n) =WX (l:nn, l:nx) £ 
•/StLT, 7 7'S 8 6 KM** 
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[0 0 7 6] 

H2 7(i, ^r;w-^>convev<7)7n--c&&o 
aWT#Mi^titv^ 0 3^*fftffyjOH*^^ h ^?& s ev(k,nev) 
[0 0 7 7] 

mthrS.^nothrdHltX lx KikhX^-y K#-^- ( 1 -numthrd) £g£^1-& 0 * 
f-^S 9 6tC£^"C. M'J TM^lX&o f^S 9 7H£v>T. y 
VCfrtmM-rZth&„ 1" len= (nev+numthrd-1) /numt 

hrd, is=(nothrd-l) X len+l> ie=min(nev, nothrdx len) , width=ie-is+l i k%\% 
■f&o *f- v"7S 9 8 td&v^-C, ^y^-^->convevthrd^D^CFtb L, Hl^ft 

[0 0 7 8] 

HI 2 8 a 150 2 9(2, •^r;u-^>convevthrdc0 7n--e^^ 0 
[0 0 7 9] 

Xr- y7°Sl lOtC&v^T, b 1 k iZ^U y 2> 0 fu y ?ip§(2 8 

O^JS-Cab^o ^f7 7°S 1 1 1-eti. iwidth<0^5^£¥iJKfrt&o ^T7 7'S 
1 1 l^WYESWi:ii, -9--7^-^>*^t^o ^f77"Slll<0 
flJBfr^NOcoi#^U(2, Xf77"S 1 1 2 tcittfo Xfy7*S 1 1 2-Cfi, JilT 
(D^-y 0 xm^mJi(D^uy ^{±^ (l + ffuut) ^M^ft^LT^a 0 
Jf> numblk=(n-2+blk-l)/blk, nfbs=n-2-blkX (numblk-l) £;££>£o Z-ry-y 9 
S 1 1 3 1CJ31/*T\ i<n-2-nfbs+l^S^*^Jaffi-^o 7 7'S 1 1 3 0*IJBfj&* 
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YESO»^i:ii, *-r?7S 1 1 4KJi*, alpha=-a(i, i), x(l : iwidth)=a(i 
+l:n, i) t Xev(i+l:n, llwidthh ev(i+l :n, 1 :width)=ev(i+l :n, 1 :width)+alphaX 
a(i+l:n, i) x x (l: iwidth)t£jfcjf U Xf-^S 1 1 3 CI^ 0 -y^S 1 1 
3 <7)flJBIp6*N Cxnm^^li, 1 5^£;^T, i=l^I^L> *t v 

7°S 1 1 6C£>v>T, i^umblk-l^S^^WDfl-^o ^f^S 1 1 6 cOfflKff^ 
NCXT^^C^ +h-7^-^->^fttt^o Xf 7 7°S 1 1 6<Dmm&YE 

Xf7 7°S 1 1 7Ci3^t, 7'D7^JMO (1+UBUt) OUtXE 

^*>, is=n-2-(nfns+iXblk) +K ie=ie+blk-K W(l:blk, iwidth)=a(ie+l :n, is: 
ie) t Xev(ie+l:n, 1: iwidth), W (1 :blk-l, 1 : iwidth)=w(l :blk-l, 1: iwidth) +TRL 
(a(is+l: ie, is: ie-l)) t Xev(is+l: ie, 1: iwidth) £ft^i~£o 7d y 

(1+UBUt) WB?:*<6Jo diag (w2) =-diag(a(is: is+blk-1, is: is+bl 
k-1)), i2=blk, itumo w2««t-2>#t«^|«ltS 0 T 
RL (w2) {iw2<7)TH^tf^J. diag(x) (i x 0*tA5?3R*P^-e**o 
10 0 8 0] 

Xx-yXS 1 1 8t:i3^tli, i2<1^5^^^lJifi"^o ^f-^Sl 1 8 0flJ 
m&NCKDm&^li, Xr'^S 1 1 9&c£v>r A w2<7)±H;£g|$:5H;^'^*;v 
h^Ortflf x a fcteiWU il=i2-l<hl^i"-2>o Xf-^S 1 2 Otli, i 
l<l^g^£¥iJBffU NO^Cti, x-r -7-7°S 1 2 1 Ki5V>t\ w2(il, i2)=w 
2(il, il)X (a(is+i2:n, is+i2-l) t Xa(is+i2:n, is+il-l)h il=il-l ZtfM U 
^x-y-7°S 1 2 OiZM&o *tv7°S 1 2 OO^lJBf^YE S^i:ii, *x;> 
7'S 1 2 2 U*3V*-r\ i2=i2-ltl£;eU X-r y 7°S 1 1 8 i:I^ D y 7°S 
1 1 8 ^>*UWf7&*Y E SOt^tli, X-r -77°S 1 2 3 Kfc^T, il=blk-2<h U 

^mLT^fBcSr^^dttCffl^-r^o Xfy -7°S 1 2 4 ICioV^-C, il<lj&»5#» 
^Wifi-^o XryT'S 1 2 4<0JpJ»f**N0O*&^m± > Xf7 7 p S12 5 l^fe 
i2=blkt|S^L, Xf ';7S 1 2 6i:^^rfi, i2<il+lJ& s 5ri»£ i PJ»f 
i"*o Xt-vT'S 1 2 6<7)^J8ff^NOO^H(i, X r * "7°S 1 2 7 tcjs^T, 
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±.m<nwm*i£frhK\z.$kibx^< 0 —Dm<n&M*mm-tz>o w2(i 

1, i2)=w2(il, i2)+w2(il, il+1 : i2-l) Xw2(il+1 : i2-l, i2h i2=i2-l£ltlf U * 
x v -ys 1 2 6 KMho *x y 7°S 1 2 6 OfiJif^YE Sot^t:ii, Xfy^ 
S 1 2 8 IC&^T, il=il-l£f£;EU y 7°S 1 2 4 |:r^ 0 *-r y 7°S 1 

2 4<DmffitfYE S«Cli, ^f7/S12 9i:IA, i2=blk£i£5£U * 
T VfS 1 3 OiCisv^T, i2<l^S^^*JKfF1-'2>o Xxy/S 1 3 0^)|lJWN 
0<0*§£-^(i, ^t'y7°S 1 3 1 tcioV^T, J^T<^- LTv^#tfc a 
£/Ht£o £1\ il=i2-l<hf£;tU Xf '7 7'S 1 3 2 I3v^, il<l;^£^£ 
flJilfi-^o ^-r-7"7°S 1 3 2 ^^nt^>*lJlff^N0<7)^^(i, Xfy^S 133 
H&v^-T. w2(il, i2)=w2(il, i2)x w 2(i2, i2k il=il-l£ftlf U Xf7 7*Sl 

3 2 CR^o Xfy7'S 1 3 2 OflJ^f^Y E S Oit^iii, Xf 77°S13 4i: 

i2=i2-l £!£t£U 7 7*S 1 3 * -r ? 7°S 1 3 0 <7)¥lJ$r 

ri $ Y E S <Z>*r£\ Xf7 7°S13 5Ci3^t, BUtSrWJtLfWJCt&jW-f 4 0 
W (l:blk,l: iwidth) =TRU (w2) XW (l:blk, 1: iwidth) £ft^1-&o fit, 
UcOi^^H^^iJ^T^Oft^O^tW^^LTtiBUt^^oT ( 1 + 
UBUt) XEV^Itft^>o -T^^*>> ev(ie+l:n,l:width) =ev(ie+l:n, l:wid 
th)+a(ie+l:n, is: ie) xw(l:blk, HwidthK ev(is+l: ie, l:width)=ev(is+l: ie, 1: 
width)+TRL (a(is+l: ie, is+1: ie)) XW (1 :blk-l, 1 iwidth) L, ^f'y7*S 

1 1 5 Clio 
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[0 0 8 2] 

•m*mmLfzii<DX\ m^mmM? J 7? VSW Performance libraryOUt 
HfeUJt^T 7fflOCPUT4. 8 f§J&« 0 SUNOCO («HBtf*#& : E 

divide & conquer&T^f o i> <D t Jt^T £ 3 . 8 ^itbSo 
[0 0 8 3 J 

& £ , fr^tH* com^T )V a' < j X A [z m L X (± . J^T O |jcf4# £ #hs £ *i£ 

G. H. Golub and C. F. Van Loan "Matrix Computations" third edition The J 
ohns Hopkins University Press, 1996 

itz, 3mwmt<Dmmnbz<}^xi$, viy^xmzmzztitz^o 

J. Choi, J. J. Dongarra, and D. W. Walker, "THE DESIGN OF A PARALLEL DEN 
SE LINEAR ALGEBRA SOFTWARE LIBRARY: REDUCTION TO HESSENBERG, TRIDIAG0NAL 
, AND BIDIAG0NAL FORM", Engineering Physics and Mathematics Division, Ma 
thematical Sciences Section, prepared by the Oak Ridge National Laborato 
ry managed by Martin Marietta Energy System, Inc., for the U.S. DEPARTME 
NT OF ENERGY under Contract No. DE-AC05-840R21400, 0RNL/TM- 12472. 
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(#15 2) m^^m^\t^yf\z^x, 7n y ? itZfitz^ft^m 

•7°n -fe y izmWZft 0 Z. t K X o T^S $ ft £ Cl t t f * #15 1 

[0 0 8 5] 
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It $ ft £ £ £#M 1-^#f5 1 dlSicco rn^ 7 A 0 
[0 0 8 7] 

(#15 6 ) *f«')I^* 9»Jft»1tffl@^P^S^»JM3l^^-e* 
^Iti"^ § H*t&fT#)£> & W± < - h ftm £ fu y >y it L , 7n y ? it L 

fe3M2J^lk£ft^T90^^^ft7t@;fr^ h/nfc, BfjEOrn ;/ 

nmnommn t & & £ ? ^m^ntz^ * *)v v-m.mz ± o -c^m l 



ffilE# 2003-3067306 



#SS 2003-09261 1 



^-v: 29/ 



[0 0 8 8] 
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[0 0 8 9] 

(#IS9) mffSa^^ h ^^^f- y ytc&^T, Hu §2^ * ^ *~ 
[0 0 9 0] 
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[02 0] 

K«^7D-ft- b-e^L^m 2) -c&& 0 
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[EI2 2] 
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^Hftn- KM^7u-ft- h*C^L£ll (>e<7)5) -C£>£ 0 
[EI2 4] 

li3-K<7)$!il$:7n-ft-Ft^Lf:Bl 6) -C&& 0 
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subroutine trid. (a, k , n, diag , sdiag) 

! a fC*M*=fT?lJ<OTHAg6^^tt1-5„ daig, sdiag ICHft»fttJ5U«>*M*, B 

constant iblk*—' set block width' 
shared array a <k , n) , diag (n) f sdiag (n) 
allocate ahared array u (n+1 r iblk) ,v (n+1 , iblk) 

c create threads 

create threads 

set nothrd and numthrd 
c nothrd tt*^ \sy Kw t 1 — # TH. numthxd=#TH U y Y<OW&0 

nb=<n-2+iblk-l) /iblk 

nbase=0 

do i=l ,nb-l 

nbase=(i-l) *iblk 

is tart— 1 

nwidth=iblk 

call copy (a r k,n,nbase, nothrd, numthrd) 
c copy 

u (nbasefl :n,l: iblk) *-a {nbase+1 : n , nbase+1 : nbase+iblk) 
call blk trid (a , k , n f diag , sdiag , nbase , istart , nwidth , 

u , v , no thrd , nunthrd) ■ 1AJ jfrMteWfrttZfa 5 

c copy back 

a (nbase+1 : n p nbase+1 : nbase+iblk) *— u (nbase+1 : n , 1 : iblk) 
call update (a , k , n , nbase , nwidth , u , v , nothrd , numthrd) 
end do 

nbase= (nb-1) *iblks 

istart=l 

nwidth=n-nbase 

call blktrid(a,k, n,diag, sdiag, nbase, is tart, nwidth, 
u , v , nothrd r numthrd) 

return 
end 



miE# 2003-3067306 



&m 2003-092611 



^-vl 13/ 



[Ell 3] 



c nbaae ll/o y ? (DtirM&iTrt*?^ y K istart ttSWffO^ffi LT?3tf* ftS 

subroutine blktrid (a , k , n , diag , sdiag , nbase , is tart , nwidth , 

u,v,no thr d , numthrd ) 
shared array a <k, n) ,diag (n) , sdiag (n) ,u (n+1 , * ) ( v (n+1 , *) 

if <nwidth<10> then 

call btuni t (a , k f n , diag , sdiag , nbase , i start , nwidth , 
u , v f nothrd , numthrd) 

else 

istaxt2*— istart 
nwidth2 «-n width/ 2 

call blktrid (a , k , n , diag , sdiag , nbase , istart2 , nwidth 2 , 
u , v , nothrd , nun third) 

BARRIER SYNC 

istart3«— istaxt+nwidth/2 

nwidth3«— nwidth-nwidth/2 

is2«— istart2 

ie2 *-i s tar t+nwi dth2 - 1 

is3*— istart3 

ie3*~ istart3+nwid3-l 

iptr*— nbase+istar3 

len«— (n-iptr+numthrd-1) /numthrd 

is*— iptr+ (nothrd- 1) *len+l 

ie*— min (n , iptr+no thrd*len) 

u(is :ie,is3: ie3) *-u(is : ie, is3: ie3) 

-u (is : ie , is2 : ie2) * w (is3 : ie3 , is2 : ie2 ) * 
-W(is : ie, is2 : ie2) *U (is3 : ie3 , is2 : ie2) t 

BARRIER SYNC 

cal 1 blk trid (a , k , n , diag , sdiag , nbase , is tar t3 , nwidth3 , 
u , v , no thrd r numthrd) 

endif 

return 

end 
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c blktrid (DfitfRAs^-^Zs 

subroutine b tuni t (a , k , n , diag , sdiag , nbase , is tar t , n width , 

u,v, nothrd , numthrd) 
shared :: a (k,n) ,diag(n) # sdiag(n> ,u(n+l ,*) ,v(n+l , *) 
shared : : tap (numthrd) f sigma, alpha 
if (nbase+istart>n-2) then 
return 
endif 

do i=istart , istart-l+nwidth 
ip tr2 «— nbas e+ i 

len*— (n-iptr2+numthrd-l) /numthrd 
is 4 — iptr2+ (nothrd-1) *len+l 
ie^-min (n , iptr2+notnrd*len) 
BARRIER SYNC 

top (nothrd) «— u (is : ie , i) **u <is : ie,i) 

BARRIER SYNC 

if (nothrd=l) then 

sigma«-s<jrt (sum (tmp (1 : numthrd) ) > » SUM f£3ftK sqrt te^Kl^tS 
diag (iptr2) *— u (iptr2 , i) 
sdiag (iptr2) *— -sigma 

u (nbase+i+1 , i) «—u (nbase+i+1 ,i) +sign (u(nbase+i+l , i) * sigma 
alpha«=l . 0/ (sigma*u (nbase+1+1 , i) ) 
u ( iptr2 , i) =alpha 
endif 

BARRIER SYNC 
iptr3=iptr2+l 
v (is : ie r i) 

«-A (iptr3 : n , iptr2+is : iptr2+ie) fc *u ( iptr3 : n , i ) 
BARRIER SYNC 

len2<— (i-l+numthrd-1) /numthrd 
isx<- (nothrd-1) *len2+l 
iex<- min(i-l , nothrd* len2) 

u (n+1 f isx:iex) (nbase+i+1 : n , isx : iex) fc *u (i+1 :n,i) 
v(n+l ,isx:iex) <— v (nbase+i+1 : n, isx: iex) **u (i+1 :n,i) 
BARRIER SYNC 

v (is : ie, i) alpha* (v (is : ie,i) -v (is : ie, 1 : i-1) *u (n+1 , 1 : i-1) fc 

-u(is:i6 l hi-l)*v{n4l / l:i-l) t ) 
BARRIER SYNC 

tmp (nothrd)*— v(is:ie,i) fc *u (is:ie / i) 

BARRIER SYNC 

if (nothrd=l ) then 

beta*— 0 . 5* alpha* sum (tmp (1 : numthrd) ) 
endif 

BARRIER SYNC 

v (i s : ie f i ) «— v (is : ie , i) -beta*u (is : ie , i ) 
BARRIER SYNC 
if(i<iblk )then 
if (ptr2<n-2) then 

u (is :ie,i+l) ^u(is: ie r i+l) -u (is : ie , istart : i) *v (i+1 , istart : i) * 

-v (is : ie, istart : i) *U (i+1 , istart : i) t 

else 

u( is:ie, i+1 : i+2)«-u(is: ie, i+1: i+2)-u(is: ie, istart: i)*Y(nri:&, istart iT 

-v(is: ie, istart: i)*u(n-l:n, istart: i) 1 

return 
endif 
endif 
enddo 

eliminate threads 

return 
end 
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z u,v ^bfT^CDT^^£*f^5^-^^ 

3 nbase \Xzf v V ? (DtitM&Tvrt*?^ V K nwidth fi:/» y ^fe 

subrourine update (a , k , n , nbase , nwidth , u , v , nothrd , numthrd) 
shared array a (k ,n) ,u <n+l , *) , v (n+1 , *) 

BARRIER SYNC 
blk<— nwidth 
nbase2<— nbase+nwidth 

len<— (n-nbase+2*numthr-d-l> / (2 * numthrd) 

isl«— nb*se+ ( nothrd- 1) *len+l 

±el*~ntin (n , hbase+nothrd*len) 

nbase3*-nbase+2 *numthxd*len 

i 6 r<— nbase 3 -no third* len+ 1 

ier<~iain (n , isr+len-1) 

a (iel+1 :n,isl : iel) 

♦-a <iel+l : n , isl : iel) -w <iel+l : n , 1 :blk) *u (isl : iel , 1 : blk) fc 

-u (iel+1 : n , 1 : blk) *w (isl : iel , 1 :blk) * 
a ( ier+1 : n , isr : ier) 

«— a (ier+1 :n, isr: iel) -w (ier+1 :n, 1 :blk) *u(isr: ie^liblk)* 
-u (ier+1 : n , 1 : blk) *w (isr : iel , 1 :blk> * 

call trupdateCa^^^isi^el^^^blk) 

call trupdate( a/k/n/isr>i er,u f v,blk) 

BARRIER SYNC 

return 

end 



subroutine trupdate <a, k , n, is , ie ,u , v , blk) 
constant blk2<->itf^|:/ n y#%%\mzfv 
shared array a (k , n) ,u (n+1 , *) r v{n+l , *) 

do i=is,ie,blk2 
is2<— i 

ie2*~ min (i+blk2-l ,ie) 
a(is2: ie,is2:ie2) 

«— a ( is2 : ie , is2 : ie2 ) (is2 : ie , 1 , blk) *u (is2 : ie2 , 1 : blk) t 
-u (is2 : ie, 1 , blk) *w (is2 : ie2 , 1 :blk) t 

enddo 

return 
end 

subroutine copy ( a ,k,n, nbase , no thrd , numthrd) 

len*— (n-nbasQ+2* nun third- 1) / <2*numthrd) 

isl*— nbase+ (nothrd-1) *len+l 

lenl=max (0,min (n-isl+1 , len) ) 

nbase3«— nbase+2*numthrd*len 

i sr*-nbase3 -no thrd*len+ 1 

lenr=max(0,min (n-isr+1 ,len) ) 

call bandcp(a,k,n,isl,len) 
call bandcp (a,k ; n,isr, ier) 

return 
end 
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subroutine bandep (a,k,n,is f len) 
conctant nb<— size of small buffer 
private w(nb,nb) 

nn< — mi.il (nb , len) 
loopx<— (len+nn-1) /nn 

do j=l , loopx 

±p«— is+(j-l) *rvn 

nl*— len- ( j-1 ) *nn 

nnx*— min (nn , nl ) 

len2«— n~ip+l 

loopy*— <len2+nnx-l) /nnx 

is2=is+ ( j-l> *nnx 

TEX <w (1 : nnx, 1 : nnx) > <— TRL (a(is2 : is2+nnx-l ,is2 : is2+nnx) ) 
TRU<a<is2 : is2+nnx-l , is : is+nnx) > «— TRL (w <1 : nnx , 1 : nnx) ) 1 

do i-»2 , 1 oopy- 1 
is3<— is2+ <i-l) *nnx 

w <1 : nnx, 1 :nnx) *— a (is3 : is3+nnx-l , is2 : is2+nnx) ) 
a (is2 :is2+nnx, is3: is3+nnx-l) *— w{l ,nnx : 1 ,nnx) * 
enddo 

is3+ (loopy-1) *nnx 
ny*-n-is3+l 

w (1 my , 1 : nx) *-a (is3 : n, is2: is2+nnx) ) 
a <is2 : is2+nnx f is3 : n) <— w(l ,ny : 1 f nx) * 

enddo 

return 
end 



tiilHf 2 003-3067306 



<&m 2003-09261 1 



^-v! 17/ 



[EI 1 7] 



c SS^t^fT^U^H*^^ (ev(l:n f l:nev) [CftSrt^ixTV^) ^yb<DfT^<Om^ 

c 

subroutine convov (a , k , n , ev f nev) 
shared array a (k ,n) , ev (k ,n) 

c create threads 

c set nothxd and nuiathrd 

c nothrdrt#^l^^ Kr t60#^T*l —#TH, nuathrd=#TH (^1/^ K 

BARRIER SYNC 

len«— (nav+numthrd- 1 ) /mom third 
is*— (nothrd-1 ) * lan+1 
ia*~ min (nev , nothrd*len) 
nevthrd*— max {ie-is+1 , 0) 

call convevthrd{a f k,n,ev<l,is) ,nevthrd) 
BARRIER SYNC 

return 
end 
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subroutine eonvevthxd (a , k , n , ov , i width ) 
constat blk<-:/cr y#1(g 
shared array a(k,n) 
array ev <k , * ) 

private w(blk ,n> , w2 (blk, blk) 

if <iwidth<0) then 

return 

endif 

nunblk=(n-2+blk-l) /blk 
nfbs«-n-2-blk* (numblk-1) 
do i=n-2,n-2-nf*bs+l,-l 

alpha*— a<i,i> ! aljto « 3 a»ftft*^»AS*K»j»*n* 
x (1 : iwidtb) *-ev <i+l : n , 1 : iwidth) **a <i+l : n, i) 
ev (i+1 : n , 1 : iwidth) *~ 

ev(i+l :n, 1 : iwidth) +alpha*a (i+1 : n,i) *x (1 : iwidth) * 
enddo 

do i=l , numblk-1 

is«— n-2- <nfbs+i*blk) +1 

ie«— is+blk-1 

w(l: blk, iwidth) 

«— a<is+l :n r is : ie) t *ev (is + 1 :n, 1 : iwidth) 
w(l :blk-l , 1 : iwidth) <~w (1 :blk-l , 1 : iwidth) 

+TRL (a (ia+1 : is , is : ie) ) "*ev (ie+1 : is , 1 : iwidth) 

DIAG(w2)<—DIAG(a<is:ie,is:ie)) « DIAC fT?j£>*tft3?l&^^ h 
do i2=blk,l,-l 
do il=i2-l,l,-l 
w2(il,i2) = 

«~w2 (il.il) * <a(is+i2:n,is+i2-l) t "*a(is+i2:n,is+il-l) ) 

enddo 

enddo 

do il=blk-l,l,-l 
do i2=blk,il+l,-l 

w2 (il , i2) «~w2 (il , i2) +w2 (il , il+1 : i2-l) *w2 (il+1 : i2-l , i2) 

enddo 

enddo 

do i2=blk / l,-l 
do il=i2-l,l,-l 

w2 (il , 12) <-w2 {il , i2) *w2 <i2 , 12) 

enddo 

enddo 

w (1 : blk , 1 : iwidth) +- 

w (1 : blk , 1 : iwidth) +TRU (w2 ) *w ( 1 : blk , 1 : iwidth) * TLU « ±^&ff&HRft 
ev (is+1 : n , 1 : iwidth) 

*~a<is+l :n f is:ie) *w(l:blk, 1 : iwidth) 
ev (ie+1 : is , 1 : iwidth) 

<~THL(a(ie+l :is, is:ie-l) )*w(l :blk-l ,l:iwidth) 
enddo 

return 
end 
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1 9] 



c 



S10>^ 



A(k/i>xiiad(rrtfc£tfsdiag(n)£AAo 
dag.sdia«IittJfUc3«»ft4f5IJ<Z)^«»fll. 

«feLshar B dK£-e*IJfB-f& 



S11 



nb=fn-2+iblk-1 Viblk.nbase=0.i=1 £ 




S19 



nbase=(nb- 1)*ibldstart=1 .iblk2=n-nbase -5> 



S20 



U(nbase+1:n.1:nwidth) «— A(nbase+1 :n,nbase+1:n> 



U(nbase+l:al:iblk) «— A(nbase+1:n.nbase+1 :nbase+iblk) 




blktrid £Df*l? 
U fcatf-Lfc« 

1T#5o ( i start- 1 , 

?Py*«l*iblk2 



A(nbase+l:anbase+1 :nbase+iblk) «-U(nbase+1 n.1 riblkj 






S22 


Atnbase+lrnxibase+l :n)*-U(nbase+ 1 yi r \ :nwidlh) 




S23 









GEE) 
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[m 2 o] 



subroutine blktrid{A.kn.diag.sdiagjibase.istart. 
nwidtb,U.V,nothrd,numthrd) 

nbaae t£?a?46)1&1I£€*'9-2j-:7H:'?K istortl* 



S25 




S27 



/ 




S30 



*S£(ie2.ie3) ^ftft^So 

istart3^istart^width/2.nwcWi3=nwidth-nwidth/2. 
ts2=istart2. ie2=istart+nwidth2-1 . 
is3=istart3. ie3=istart3+nwidth3H 
iptr=nbase+tstart3. 
I«n=(n-iptr+nurnthrd-1 )/ numthrd, 
ia=iptr+(nothrd-1 )*ten+1. ie=min(n.tptr+nothrd*len) 



U(is:ie.ic3:ie3)=U(i63ejs3:ie3)-lXis :ie,is2;ie2)*W(is3:ie3.is2:ie2) * 
-W(is:te.is2:ie2)*U(is3:ie3.is2:ie2> 1 
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IM2 1 ] 



(5 



btunit(A,kn,diag.scliag.nhase 1 istaPt.nvvidth ( U.V,oothrd,numthr<i) 



S35 



tmp(numthrd).sigma.alpha£sharedJK&^tiyfa1t5> 



S36 




S37 



return ^ 



S38- 



i=istart 



S39 



S40 




iptr2=nbase-H, Ien=(n-iptr2+numthrd-1 )/numthrd. 



=(n-iptr2+numthrd-1 ', 
iB=iptr2+<nothrd- 1 )*ten+ 1 . ie=mtn(n.iptr2+riothrd*len) 



S41 



" > ^ s ^' Karri 



barrier f^#J£££ 



S42 



tmp(nothrd>=U(is:ie.i) t *U(isie,i) 



S43 



barrier ^1fiMtb'& 




S44 



S45 



sigma=sqrt{suni(tmp<1:numthrd))) ! 

SUM ttfO, sqrtia¥WS 

diag(iptr2)=uGptr2.i) > sdia£(iptr2)=-£i£ma . 

U(nbase^+1 j)=U(nbase+t+1 j}+sign(u(nbaBe+i+1 ,i)*signna 

alpha=1.0/<si£ma*u(nbase+i+1,i}, U (iptr2.i)=atpha 



S46 



barrier f*l$l££%> 



S47- 



fctr3=iptr2+l 



S4B, 



>Ais:iei)=A(iptr3:n.iptr2+is:iptr2+ie) t U(iptr3:n.i) 



© 



© 
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© 



S49 



barrier [5\Wi£t ; & 



S50 



V V(is:ie.i)=alph 
^ -V(ts:ie.1:i-: 



feha*(V{is iei) 

JMU(iptr3:a1:i-D r *U(iptr3:n.i» 
U(is:ieJ:MMV(iptr3:n.1:M) * *UGptr3:n.i))) 



S5I 



V 



harrier 



S52 



trnp(nothrd)=V(is:iej) 1 *U(isiej) 



S53 



barrier fs)&££& 



S55 




S54 



beta=0.5*alpha*sum(tmp( 1 rnumthrd}^ 
sum tt'<trh)l'0>%}$t%>. 



S56 



barrier fqj#j££'£ 



S57 



V(is:ie,i)=V(is:ie.iH>eta*U<is:ie.i} 



S58 



barrier f5H8$i& 



S59 




N 



^ U(is:ie.i+1)=U(ia:ie < H-1)-U(is:ie.i start :i> 

V(rH,istart:i) ' -V(is;iejstart:i)*U(n+1.istart:i) * 



S61 

It 



0 



Uisie.rM :t+2)=U(is:ie,r* I :»+2)-U(is:ie,i start:!)* 
V(i+1 :n.istart:i) t -V(isiejstart:i)*U(n+1 :n.istart t) 1 



^ return^ ^ returrT^ ^ 
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C 



V-JJI— ^>update 



S65 



harrier S*8£iS 



S66 



\ 



nbase2=nbasc+iblk 

Ien=(n~nbase2+2*numthrd-1)/(2*numthrd). 
isl=nbase2+(nothrd- 1 )len+ 1 , iel=min(n,nbase2+nothrd*ten), 
nbase3=nbase2+2*numthrd*len, tsr=nbase3-nothrcf*len+1 . 
ier=min(n.isr+len- 1 ) 



S67 



\ 



A(ieH-1:n,isl :iel)=A{iel+1 :n,isl:iel)~ 
W(iel+ 1 :n.1 :b!k>U(isl:iel.1 :blk) * 
U(ieH-1 :n,1 :blk>W(isl:iel,1 :blk) * 

A(ier+1 :n.isrier)=A(ier+1 m.isrtier)- 
W(ier+1:n,1:blk)*U0srier,1 :blk) * 
U(ier+l:n,1 :blk)*W(isrier.1 :blk) ' 



S68 



S69 



S70 




^ return ^ 
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gem - Kco mm&y □ — h Lfc m ( * 0 6 ) 



Sib 



S76 




S77 x #XL^KT*©3E#©&J*L **&£**!>&o 
1 is2=t, ie2=mrn(i+b!k2 -1 je-1 ) 
" a(is2:ie-1 .is2.ie2)= a(»s2:ie-1 ,is2,ie2) 
-U(is2:ie-1 , 1 :Wk)*W(is2:ie2,1 :bfk) x 
-W(ts2:ie-1 .1 :blk)*U(is2:ie2,1:btk) 1 



S78 \r 



i=i+blk2 



C 



return 
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S80 



\ 



ten=(n-nbase+2*numthrd- 1 )/ (2*numthrd). 
is1=nbase+(nothrcM )*len+ 1 , 
tenl= max(0.min(n-isH-1 Jen)). 
nbase3=nbase+2*numthrd*len, 
isr=nbase3-nothrd*len+1 . 
tenr=max(0.min(nHsr+1 ,len» 




S82 
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2 6] 



I *te*$m*£A<£=lfcf— $?s5r, WX(nb.nb) j 



S85 



nn=mtn<nb,len). loopx=(len+nn-1 )/nn j=1 




S87 



*<D*:?**yHp£3fctf>S. 

ip=is+0~1)*nn. r»1=len-G-1)*nn. nnx=min(nn.n1), 

Ien2=n-ip-nnx+l , Ioopy=(len2+nn-l )/nn 

TRL<WX(1 rnnx.l :nnx))=TRL(A(ip:ip*nnx-1 ,ip:ip+nnx- 1 )). 

TRU(A(ip:ip+nnx-1 ,ip:ip+nnx-1 ))=TRL(WX(1 :rmx j:nnx)) 

(TRUttJi = ftS?^ TRL liTH^M) 

i=1 ,is2=ip.is3=ip+nnx 




S89 



nr. x nnx <D*fl#£lte«LT:3t: — 

WX(1 :nn.1 :nnx)=A(is3:is3+nn-1 js2:is2+nnx-1 ), 
A(is2:is2*rmx- 1 Js3:is3+nn-1 )=WX( 1 ,nn: 1 ,nnx) 1 
is3=is3+nn 



S90 



\ 



nn=n-is3+1 

WX{1 :nn.1 :nx)=A(is3:n,is2:is2+nnx~ 1 )) 
A(is2:is2+nnx-1 ,is3:n)=WX<1 :nn.1 :nx) 



C 



return 
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"t^^U — 5F>convev 



S95 




S96 



barrier fsj$££<5> 



S97 



len=(nev+nynnthrcr-1 )/numthrd, 
is=(nothrd-1 )*len+1 . ie=min(nev.nothrd*len) 
width=ie-is+1 



— ^>convevtlird 



S99- 



barrier f5\M&tZ> 



SI 00 



^ retum ^ 
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[122 8] 



"t^/l/— ^convevthrd \ 



S110 



© 



S1 11 




S112- 



C rettrn ) 



numblk=(n-2+blk-1 )/Wk. iifbs=n-2-blk*(numblk-1 } 



S113 





alpha=-a(i ( i), 

x(1 :iwidth)=a(i+l :n,i) ' * ev(i+1:n,1 rwicfch}, 
ev(r+t -width^ev(i+ 1n.1-width)+alpha*a(i+1 :n.i)*x(1 width) 5 



^a-y^StOXI+LJBU 1 ) 0) lT *EV £U' ©£M(BJ: 

w(l:blk.iwj<tth)=a<ie+1:n.is:ie5 * *evCe*1:n.l:i width) 
w(1:blk-1.1riwidth>=w(1 :blk-1 ,1 swidth)« 
TRL(a(is+1:i«.ic:ie-!)) * *ev(is-H :ie,1 :ivwidth) 

db£(w2>=-diag(a(is:b+blk- 1 .is:is+blk- 1 )).i2-blk 

<TRL(w2)= I*w20 T 5 S fmdiag(x>f*x<D*m 



S1 19 




il=i2-1 




w2(i1.i2/=w2(il jl)* 
(a(is+i2:n.is+i2-l ) * *a(is+i2:njs+it -1 )), 
it=i1-1 



SI 22 



i2=i2-1 



0 0 
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0 



SI 23 



© 



i1=Wk-2 UlT2m/L— ^t?i?IMft|»$f+irt^o 




w2(i1 ,i2)=w2<i1.i2>*-w2<i1 :i2-1)*w2(iH-1:i2-1 j2) 



St 28- 



XT 



=i 1 — t 



S135 



\ 




SI 34 



au 1 SM-fcurw i=*«+« e 

W(1 :blk1 :iwidth)=TRU(w2>*W(l :Wk 1 :iwidth) 

ev(ie+1 :n,1 :width)=evGe+1 :n. 1 : width) ■•- 

aOe+1 :rus:«)*VV<1 :bk, J .-width) 
ev(i»-M :width)=cv{i«+1 :ie.1 :width)-»- 

TRL(a(is+1 :ie.is+1 ae))*W(l :blk-1 .1 twidth) 



C 



return 
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